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ABSTRACT 

With  economic  globalization  and  continuous 
development  of  e-commerce,  customer  relationship 
management  (CRM)  has  become  an  important  factor 
in  growth  of  a  company.  CRM  requires  huge 
expenses.  One  way  to  profit  from  your  CRM 
investment  and  drive  better  results,  is  through 
machine  learning.  Machine  learning  helps  business  to 
manage,  understand  and  provide  services  to  customers 
at  individual  level.  Thus  propensity  modeling  helps 
the  business  in  increasing  marketing  performance. 
The  objective  is  to  propose  a  new  approach  for  better 
customer  targeting. 

We’ll  device  a  method  to  improve  prediction 
capabilities  of  existing  CRM  systems  by  improving 
classification  performance  for  propensity  modeling. 

Keywords:  Customer  Relationship  Management 
(CRM),  Machine  Learning,  Customer  Segmentation, 
Customers  Targeting,  K-means  algorithm,  Smote, 
Logistic  Regression,  Classification,  Clustering 


and  predict  the  likelihood  of  accounts  more  likely 
buy.  Customer  targeting  is  one  of  the  most  important 
components  of  the  customer  relationship  management 
(CRM)  systems.  Customer  targeting  helps  in 
identifying  promising  prospects  to  generate  more 
revenue.  Improving  customer  targeting  is  important 
for  reducing  overall  cost  and  boosting  business 
performances.  Marketing  professionals  achieve  this 
task  using  a  classification  model  for  buyer  targeting. 

1.1  Analysis  Scenario: 

For  identifying  prospective  customers  It  is  important 
to  measure  a  subject’s  “propensity  to  buy”  a  particular 
product.  We  can  take  advantage  of  the  large  amount 
of  demographic  data  available  to  target  those  who 
have  the  highest  propensity  to  buy  thus  improving  our 
chances  of  success  [9].  We  will  devise  a  method  to 
exploits  the  customer  data  in  conjunction  with  the 
demographic  data  from  the  overall  market  population 
containing  buyer  vs.  non-buyer  data. 


I.  INTRODUCTION 

CRM  requires  a  big  expense  in  the  form  of 
Implementation,  updates,  and  training.  Implementing 
Ml  on  top  of  crm  system  will  not  only  increase  ROI 
but  will  also  enable  us  to  derive  better  results  from  the 
huge  collection  of  data  available  from  sales  and 
marketing,  customer  support.  This  will  enable  us 
to  achieve  predictive  crm  which  could  gather  both 
internal  and  external  data  about  different  prospects 


This  will  increase  the  predictive  accuracy  of  the 
classification  model  and  improve  Customer 
Targeting  Performance. 

II.  Related  Work 

One  of  the  key  problems  in  CRM  is  buyer  targeting, 
that  is,  to  identify  the  prospects  that  are  most  likely  to 
become  customers.  Marketers  are  applying  data 
mining  tools  to  solve  the  problem,  such  as  in  [1]  the 
authors  focused  on  classification  of  online  customers 
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based  on  their  online  website  behaviors,  and  [2] 
applied  neural  networks  guided  by  genetic  algorithms 
to  target  households.  [3]  Proposed  a  new  feature 
selection  technique  .In  this  work  the  classification 
performance  of  C4.5  Decision  Tree,  NaiveBayes 
classifier,  SVM  classifier  and  KNN  classifier  was 
compared.  SVM  classifier  was  found  to  be  working 
best  with  this  methodology.  [4]  proposed  a  hybrid 
algorithm  which  uses  the  concept  of  clustering  and 
decision  tree  induction  to  classify  the  data  samples. 
This  approach  solves  issues  of  burdening  decision  tree 
with  large  datasets  by  dividing  the  data  samples  into 
clusters.  In  [5]  the  author  suggested  a  customer 
classification  and  prediction  model  in  commercial 
bank  that  uses  collected  information  of  customers  as 
inputs  to  make  a  prediction  for  credit  card  proposing. 
She  implemented  Naive  Bayesian  classifier  algorithm 
[8]  developed  individually  tailored  predictive  models 
for  each  segment  to  maximize  targeting  accuracy  in 
the  direct-mail  industry.  In  such  a  step-by-step 
approach,  the  buyer  targeting  (the  second  step) 
becomes  dependent  on  the  results  of  customer 
segmentation  (the  first  step).  However,  the  customer 
segmentation  has  to  be  implemented 
independently.  [9]  proposed  to  first  use  K-Means 
clustering  to  segment  customers  and  then  build  the 
segment-wise  predictive  models  for  better  targeting 
the  promising  customers.  [12]  In  customer 
segmentation  and  buyer  targeting  as  a  unified 
optimization  problem  was  formulated  as  a  single 
problem.  The  integrated  approach  not  only  improves 
the  buyer  targeting  performances  but  also  provides  a 
new  perspective  of  segmentation  based  on  the  buying 
decision  preferences  of  the  customers.  A  new  K- 
Classifiers  Segmentation  algorithm  was  developed  to 
solve  the  unified  optimization  problem. 

III.  Preliminaries 

The  method  proposed  here  uses  the  concepts  of 
clustering  and  classification  with  class  imbalance 
problem.  These  concepts  are  further  explained  below. 
The  algorithm  k-means  was  used  for  customer 

targeting  [12]  and  for  feature  selection  in  [4], SMOTE 
algorithm  was  used  for  handling  class  imbalance  in 
[1 1]. Logistic  Regression  was  used  as  a  benchmark  for 
the  comparative  analysis  ofRFM  and  FRAC 
methods  in  [13], This  algorithms  are  elaborated  in 
detail. 

Algorithms:- 

The  algorithm  k-means  was  used  for  customer 

targeting[12]  and  for  feature  selection  in  [4], SMOTE 


algorithm  was  used  for  handling  class  imbalance  in 
[1 1]. Logistic  Regression  was  used  as  a  benchmark  for 
the  comparative  analysis  of  RFM  and  FRAC  methods 
in  [13].This  algorithms  are  elaborated  in  detail. 

A.  K  means  Clustering:- 

The  real  life  datasets  has  multiple  number  of 
features[4]  .Grouping  these  features  on  the  basis  of 
similarity  is  required.  Clustering  is  an  unsupervised 
method  of  separating  a  large  number  of  data  into 
subsets  with  similar  characteristics.  Different 
clustering  methods  can  generate  different  groupings 
for  same  set  of  data  samples.  Clustering  can  be 
broadly  classified  as  partition  based  and  hierarchical 
based.  Some  examples  of  the  techniques  used  for 
partition  based  clustering  are  k-means  and  k 
medeoids.  The  algorithm  proposed  in  this  paper  uses 
k-means  algorithm  for  feature  selection. 

B.  Logistic  Regression:- 

In  the  logistic  regression  model,  the  predicted  values 
for  the  dependent  variable  will  always  be  greater  than 
(or  equal  to)  0,  or  less  than  (or  equal  to)  1.  [10]. 

The  name  logistic  stems  from  the  fact  that  one  can 
easily  linearize  this  model  via  the  logistic 
transformation.  Suppose  we  think  of  the  binary 
dependent  variable  y  in  terms  of  an  underlying 
continuous  probability  p,  ranging  from  0  to  1.  We  can 
then  transform  that  probability  p  as: 


Logistic  regression  is  very  useful  for  several  reasons: 
(1)  logistic  modeling  is  conceptually  simple;  (2)  easy 
to  interpret  as  compared  to  other  methods  like  ANN 
(3)  logistic  modeling  has  been  shown  to  provide  good 
and  robust  results  in  comparison  studies  [6]. For 
database  marketing  applications,  it  has  been  shown  by 
several  authors  [7]  that  logistic  modeling  may 
outperform  more  sophisticated  methods. 

C.  SMOTE:- 

An  over-sampling  approach  in  which  the  minority 
class  is  over-sampled  by  creating  “synthetic” 
examples  rather  than  by  over-sampling  with 
replacement^,  1  l].The  minority  class  is  over-sampled 
by  taking  each  minority  class  sample  and  introducing 
synthetic  examples  along  the  line  segments  joining 
any  or  all  of  the  k  minority  class  nearest 
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neighbors[l  1,2],  Depending  upon  the  amount  of  over- 
sampling  required,  neighbors  from  the  k  nearest 
neighbors  are  randomly  chosen[2]. 


VI.  Proposed  Model: 

Now,  with  the  advancement  in  technology  it  has 
become  easier  to  manage  business  relationships  and 
the  informative  data  associated  with  it.  Data  like 
customer  contact  details,  accounts  and  lead 
information,  sales  opportunities  in  different 
geographical  regions  is  stored  in  CRM  systems 
mostly  in  cloud  at  one  central  location,  such  that  the 
data  is  available  to  many  at  real  time  with  ease  and 
speed.  As  the  amount  of  data  generated  from  sales  and 
marketing,  customer  support  and  product 
development  departments  is  increasing  exponentially, 
the  existing  CRM  systems  potential  to  generate  more 
revenue  has  also  increased. 

Deriving  insights  from  the  available  data  is  important 
not  only  to  get  more  profit  but  also  to  save  our  efforts 
and  time.  After  all  huge  data  doesn’t  necessarily  mean 
better  decision  making. 
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Fig.  1  shows  the  block  diagram  representation  for  the 
proposed  algorithm. 

In  this  paper,  we  have  used  algorithms  K-means 
, Smote  and  Logistic  Regression  for  customer 
targeting  .Correct  logical  combination  of  this 
algorithms  improves  the  performance  metrics  of  the  p 
redictive  model. 


VI.  Case  Study 

Let  us  consider  a  case  study  involving  a  marketing 
manager  of  a  large  company  selling  bikes  ,  must  run  a 
mailing  campaign  to  find  the  best  prospective 
customers  which  is  direct  marketing  . 

To  build  a  buyer  propensity  model  ,  we  require  data 
which  must  contain :- 

a.  Sales  transactions  data  .The  data  about  our 
customers  including  those  who  bought  bikes  in 
the  past. 

b.  Data  from  previous  marketing  campaigns. 

c.  List  of  customer  who  Purchased  from  third- 
party  vendors. 

This  dataset  has  been  collected  From  Azureml 

gallery  at  http://gallery.azureml.net/  The  dataset  has  : 


Dataset 

Observations 

Features 

BikeBueyer 

10000 

18 

Missing  values  =  9%  and  above. 

Class  Imbalance  =  85.039%  Majority  class  and 
14.96%  Minority  class. 

VII.  Working 

Handling  Class  imbalance  during  data  cleansing 
and  processing: 

This  problem  occurs  in  machine  learning  when  one  of 
the  class  of  data  is  much  lesser  than  the  other  class  of 
data.  This  produces  a  biased  classifier  .This  problem 
is  very  common  and  commonly  observable  in 
disciplines  like  fraud  detection. 

It  is  a  combination  of  oversampling  and  under 
sampling. 

a)  Firstly  we  identify  the  majority  class  in  the 
given  dataset.  Then  we  under  sample  the  majority 
class  by  simply  removing  some  of  the  samples 
from  the  majority  class  .This  is  done  until  the 
minority  class  becomes  some  specified  percentage 
of  the  majority  class. 

b)  After  having  done  so  ,  we  construct  the  minority 
class  by  oversampling  it  i.e  generating  new  class 
data  instances  using  an  algorithm  SMOTE. 

c)  SMOTE  helps  in  producing  synthetic  instances  of 

the  minority  class.  Thus  SMOTE  (Synthetic 
Minority  Over-sampling  Technique) 

enable  learning  from  imbalanced  datasets. 
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SMOTE  is  widely  used  because  its  simple  and 

effective  .The  new  data  is  given  as  input  to  k-means. 

Feature  selection:- 

K-means  algorithm  is  applied  for  feature  selection. 

Then  reduced  dataset  is  given  as  an  input  to  two  class 

Logistic  Regression  classifier.  The  proposed  method 

consists  of  the  following  steps: 

a)  The  number  of  clusters  k  has  been  determined 

using  average  silhouette  for  the  entire  range  of  k 
as  we  specify  .The  mean  of  all 
the  silhouette  values  is  calculated 

for  distinct  clusters.  The  count  of  number 
of  clusters  having  high  sill  value  is  picked  up  to 
be  the  value  of  k.  We  use  euclidian  distance  as  the 
distance  measure  of  each  data  points  from  the 
centeriod  of  the  cluster. 

b)  After  defining  the  value  ofk  number  of  clusters, 
apply  k-means  on  each  of  the  features  of 
the  dataset  individually  .For  each  cluster  we 
measure  the  clustering  performance  using  the 
metric  silhouette. 

c)  We  include  the  feature  which  gives  us  the 
best  performance  and  add  it  to  the  final  set  of 
features.  We  repeat  applying  k-means  on  the 
remaining  features  individually  and  pick  the 
feature  which  has  the  highest  sill  value. 

d)  If  we  have  reached  the  desired  number  of  features, 
we  stop  otherwise  we  repeat  the 
procedure  as  mentioned  above.  This  way  we  have 
performed  feature  selection.  The  data  is  passed  to 
logistic  regression  for  classification. 


y=+l 

y=-l 

y=l 

TP 

FP  (Type  I) 

y=-l 

FN  (Type  II) 

TN 

Where  f(x)  is  chosen  to  be  a  linear  model. 

We  use  Logistic  Regression  as  a  classifier  to  train  our 
model  .We  estimate  the  coefficients  as 


;  arg  min 
‘_pl,  p2  ,p3  ,... 


n 

lv. 


-  >  log  (i  +  e 

n  £—i 


-yi  2f=i  Pxij 


) 


k=0 


Logistic  Regression  adjusts  the  value  of  B. 

The  predictive  accuracy  can  be  increased 
by  regularization.  Thus  adjusting  L2-Norm  and  Ll- 
norm  near  about  0.01  gave  me  the  highest  predictive 
power. 


c.)  We  score  our  model  by  computing  score  for 
each  x,  in  test  set. 

p 

/Of)  =  ^  PjXij 
7=1 

Thus  our  model  can  be  represented  as 
BikeBuyer 

1 

~  g-piage+p2n-cars+P'icommunte^istance+  -+n-children 

Where  PI,  P2  ,|53  etc.  represent  the  coefficients  of  the 
independent  variables. 

e  is  the  error  which  represents  the  variability  in  the 
data. 

In  this  equation,  Bike  Buyer  is  the  probability  that  a 
customer  will  buy  a  bike,  given  his  or  her  input  data, 
where  Age,  cars,  and  Communte  distance  are  the 
independent  variables. 

VIII.  Result  and  Analysis 

The  method  has  been  implemented  in  azure  machine 
learning  studio  using  R  programming  language. 

We  represent  confusion  matrix  as:- 


Classification:- 

We  split  our  preprocessed  data  into  training  and  test 
set. 

We  estimate  the  coefficients  and  train  our  model  as 
follows  - 

si'O'./w) 

k=0 

+  Regularization  (/) 


For  evaluation  of  the  quality  of  results  i  have  used  we 
have  used  the  following  metrics- 
a  .Predictive  accuracy:  - 

FP  +  FN  1  y 

n  n  2-i 

i= 1 


b.  Sensitivity:- 

TP 

#POS 


\yi=yiandyj=l ] 

[y;=l] 
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c.  Specificity: - 

TN 

Meg 


\yi=yi  and  y,=-l] 

[yi=- 1] 


d. 


False  Positivity  Rate:- 

FP  =  Zf  1 
#Neg  ~  £?  1 


[yi*yi  and  yt=- 1] 

[y«=- 1] 


e.  Precision :- 


Management.  The  study  will  help  the  company 
to  analyze  and  forecast  customer’s  pattern  of 
purchase.  This  method  is  applicable  to  industries  like 
banking  industry,  insurance  industry,  retail  industry, 
manufacture  industries,  we  outlined  a  simplified 
method  to  guide  marketers  and  managers  in  making 
focus  their  advertising  and  promotion  on  those 
categories  of  people  in  order  to  reduce  time  and  costs. 
The  case  study  we  evaluated  shows  the  practical  use 
and  usefulness  of  the  model. 


TP 

#Predicted  Positive 


[yt=yi  and  yf=l] 

[y<=i] 


f  .Fl-Score:- 


F1  =  2 


Precision  x  Recall 
Precision  +  Recall 


g.  Area  under  ROC  Curve  which  gives  us  the  True 
Positivity  Rate  for  a  particular  False  Positivity  Rate. 


True  Positive 

False  Negative 

177 

5 

False  Positive 

True  Negative 

4 

410 

Confusion  matrix  is  as  follows :- 


Performance  Parameter 

Results 

Accuracy 

0.987 

Precision 

0.989 

Threshold 

0.6 

Recall 

0.967 

FI -Score 

0.978 

Positive  Label 

Yes 

Negative  Label 

No 

X.  CONCLUSION 
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